27 research outputs found

    Personalized Web Search via Query Expansion based on User’s Local Hierarchically-Organized Files

    Get PDF
    Users of Web search engines generally express information needs with short and ambiguous queries, leading to irrelevant results. Personalized search methods improve users’ experience by automatically reformulating queries before sending them to the search engine or rearranging received results, according to their specific interests. A user profile is often built from previous queries, clicked results or in general from the user’s browsing history; different topics must be distinguished in order to obtain an accurate profile. It is quite common that a set of user files, locally stored in sub-directory, are organized by the user into a coherent taxonomy corresponding to own topics of interest, but only a few methods leverage on this potentially useful source of knowledge. We propose a novel method where a user profile is built from those files, specifically considering their consistent arrangement in directories. A bag of keywords is extracted for each directory from text documents with in it. We can infer the topic of each query and expand it by adding the corresponding keywords, in order to obtain a more targeted formulation. Experiments are carried out using benchmark data through a repeatable systematic process, in order to evaluate objectively how much our method can improve relevance of query results when applied upon a third-party search engin

    Learning Methods and Algorithms for Semantic Text Classification across Multiple Domains

    Get PDF
    Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one

    Learning to Predict the Stock Market Dow Jones Index Detecting and Mining Relevant Tweets

    Get PDF
    Stock market analysis is a primary interest for finance and such a challenging task that has always attracted many researchers. Historically, this task was accomplished by means of trend analysis, but in the last years text mining is emerging as a promising way to predict the stock price movements. Indeed, previous works showed not only a strong correlation between financial news and their impacts to the movements of stock prices, but also that the analysis of social network posts can help to predict them. These latest methods are mainly based on complex techniques to extract the semantic content and/or the sentiment of the social network posts. Differently, in this paper we describe a method to predict the Dow Jones Industrial Average (DJIA) price movements based on simpler mining techniques and text similarity measures, in order to detect and characterise relevant tweets that lead to increments and decrements of DJIA. Considering the high level of noise in the social network data, w e also introduce a noise detection method based on a two steps classification. We tested our method on 10 millions twitter posts spanning one year, achieving an accuracy of 88.9% in the Dow Jones daily prediction, that is, to the best our knowledge, the best result in the literature approaches based on social networks

    On Deep Learning in Cross-Domain Sentiment Classification

    Get PDF
    Cross-domain sentiment classification consists in distinguishing positive and negative reviews of a target domain by using knowledge extracted and transferred from a heterogeneous source domain. Cross-domain solutions aim at overcoming the costly pre-classification of each new training set by human experts. Despite the potential business relevance of this research thread, the existing ad hoc solutions are still not scalable with real large text sets. Scalable Deep Learning techniques have been effectively applied to in-domain text classification, by training and categorising documents belonging to the same domain. This work analyses the cross-domain efficacy of a well-known unsupervised Deep Learning approach for text mining, called Paragraph Vector, comparing its performance with a method based on Markov Chain developed ad hoc for cross-domain sentiment classification. The experiments show that, once enough data is available for training, Paragraph Vector achieves accuracy equiva lent to Markov Chain both in-domain and cross-domain, despite no explicit transfer learning capability. The outcome suggests that combining Deep Learning with transfer learning techniques could be a breakthrough of ad hoc cross-domain sentiment solutions in big data scenarios. This opinion is confirmed by a really simple multi-source experiment we tried to improve transfer learning, which increases the accuracy of cross-domain sentiment classification

    Smart city pilot projects using LoRa and IEEE802.15.4 technologies

    Get PDF
    Information and Communication Technologies (ICTs), through wireless communications and the Internet of Things (IoT) paradigm, are the enabling keys for transforming traditional cities into smart cities, since they provide the core infrastructure behind public utilities and services. However, to be effective, IoT-based services could require different technologies and network topologies, even when addressing the same urban scenario. In this paper, we highlight this aspect and present two smart city testbeds developed in Italy. The first one concerns a smart infrastructure for public lighting and relies on a heterogeneous network using the IEEE 802.15.4 short-range communication technology, whereas the second one addresses smart-building applications and is based on the LoRa low-rate, long-range communication technology. The smart lighting scenario is discussed providing the technical details and the economic benefits of a large-scale (around 3000 light poles) flexible and modular implementation of a public lighting infrastructure, while the smart-building testbed is investigated, through measurement campaigns and simulations, assessing the coverage and the performance of the LoRa technology in a real urban scenario. Results show that a proper parameter setting is needed to cover large urban areas while maintaining the airtime sufficiently low to keep packet losses at satisfactory levels

    Association of kidney disease measures with risk of renal function worsening in patients with type 1 diabetes

    Get PDF
    Background: Albuminuria has been classically considered a marker of kidney damage progression in diabetic patients and it is routinely assessed to monitor kidney function. However, the role of a mild GFR reduction on the development of stage 653 CKD has been less explored in type 1 diabetes mellitus (T1DM) patients. Aim of the present study was to evaluate the prognostic role of kidney disease measures, namely albuminuria and reduced GFR, on the development of stage 653 CKD in a large cohort of patients affected by T1DM. Methods: A total of 4284 patients affected by T1DM followed-up at 76 diabetes centers participating to the Italian Association of Clinical Diabetologists (Associazione Medici Diabetologi, AMD) initiative constitutes the study population. Urinary albumin excretion (ACR) and estimated GFR (eGFR) were retrieved and analyzed. The incidence of stage 653 CKD (eGFR < 60 mL/min/1.73 m2) or eGFR reduction > 30% from baseline was evaluated. Results: The mean estimated GFR was 98 \ub1 17 mL/min/1.73m2 and the proportion of patients with albuminuria was 15.3% (n = 654) at baseline. About 8% (n = 337) of patients developed one of the two renal endpoints during the 4-year follow-up period. Age, albuminuria (micro or macro) and baseline eGFR < 90 ml/min/m2 were independent risk factors for stage 653 CKD and renal function worsening. When compared to patients with eGFR > 90 ml/min/1.73m2 and normoalbuminuria, those with albuminuria at baseline had a 1.69 greater risk of reaching stage 3 CKD, while patients with mild eGFR reduction (i.e. eGFR between 90 and 60 mL/min/1.73 m2) show a 3.81 greater risk that rose to 8.24 for those patients with albuminuria and mild eGFR reduction at baseline. Conclusions: Albuminuria and eGFR reduction represent independent risk factors for incident stage 653 CKD in T1DM patients. The simultaneous occurrence of reduced eGFR and albuminuria have a synergistic effect on renal function worsening

    Cross-domain sentiment classification via polarity-driven state transitions in a Markov model

    No full text
    Nowadays understanding people’s opinions is the way to success, whatever the goal. Sentiment classification automates this task, assigning a positive, negative or neutral polarity to free text concerning services, products, TV programs, and so on. Learning accurate models requires a considerable effort from human experts that have to properly label text data. To reduce this burden, cross-domain approaches are advisable in real cases and transfer learning between source and target domains is usually demanded due to language heterogeneity. This paper introduces some variants of our previous work [1], where both transfer learning and sentiment classification are performed by means of a Markov model. While document splitting into sentences does not perform well on common benchmark, using polarity-bearing terms to drive the classification process shows encouraging results, given that our Markov model only considers single terms without further context information

    Reliability evaluation of mechanical components using maintenance monitoring systems, Parte I

    No full text
    Cross-domain text classification deals with predicting topic labels for documents in a target domain by leverag- ing knowledge from pre-labeled documents in a source domain, with different terms or different distributions thereof. Methods exist to address this problem by re-weighting documents from the source domain to transfer them to the target one or by finding a common feature space for documents of both domains; they often re- quire the combination of complex techniques, leading to a number of parameters which must be tuned for each dataset to yield optimal performances. We present a simpler method based on creating explicit representations of topic categories, which can be compared for similarity to the ones of documents. Categories representations are initially built from relevant source documents, then are iteratively refined by considering the most similar target documents, with relatedness being measured by a simple regression model based on cosine similarity, built once at the begin. This expectedly leads to obtain accurate representations for categories in the target domain, used to classify documents therein. Experiments on common benchmark text collections show that this approach obtains results better or comparable to other methods, obtained with fixed empirical values for its few parameters

    A study on term weighting for text categorization: A novel supervised variant of tf.idf

    No full text
    Within text categorization and other data mining tasks, the use of suitable methods for term weighting can bring a substantial boost in effectiveness. Several term weighting methods have been presented throughout literature, based on assumptions commonly derived from observation of distribution of words in documents. For example, the idf assumption states that words appearing in many documents are usually not as important as less frequent ones. Contrarily to tf.idf and other weighting methods derived from information retrieval, schemes proposed more recently are supervised, i.e. based on knownledge of membership of training documents to categories. We propose here a supervised variant of the tf.idf scheme, based on computing the usual idf factor without considering documents of the category to be recognized, so that importance of terms frequently appearing only within it is not underestimated. A further proposed variant is additionally based on relevance frequency, considering occurrences of words within the category itself. In extensive experiments on two recurring text collections with several unsupervised and supervised weighting schemes, we show that the ones we propose generally perform better than or comparably to other ones in terms of accuracy, using two different learning methods
    corecore